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We consider the problem of finding for a given iV-tuple of polynomials the closest iV-tuple that has a com- 
mon divisor of degree at least d. Extended weighted Euclidean seminorm of the coefficients is used as a 
measure of closeness. Two equivalent representations of the problem are considered: (i) direct optimization 
f ) ' over the common divisors and cofactors (image representation), and (ii) Sylvester low-rank approximation 

(kernel representation). We use the duality between least-squares and least-norm problems to show that (i) 
and (ii) are closely related to mosaic Hankel low-rank approximation. This allows us to apply to the ap- 
proximate common divisor problem recent results on complexity and accuracy of computations for mosaic 
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Hankel low-rank approximation. We develop optimization methods based on the variable projection prin- 
ciple both for image and kernel representation. These methods have linear complexity in the degrees of the 
polynomials if either d is small or d is of the same order as the degrees of the polynomials. We provide a 
software implementation of the developed methods, which is based on a software package for structured 
low-rank approximation. 
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1. Introduction 

In many applied problems there is a need to compute a (greatest) common divisor of a set 
of polynomials with inexact coefficients. An approximation may be needed because the input 
polynomial coefficients are subject to measurement noise or they have a representation with 
limited precision. Ex a mples include problem s in signal processing and system identification 
(Agra walet all l2004t iGaubitchet al. . 20051). c omputer-aided geometric d esign (JEmiris et 



2013b . blind image deblurring (ILi et all BOlOj). control of linear systems JKhare et all |201 



and approximate factorization of polynomials (lZena, :2008). 
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In this paper, we consider the problem of finding the closest polynomials to given ones such 
that they have a common divisor of degree at least d, where d is a fixed parameter. This problem 
is an important part of the approximate greatest common divisor problem. We do not consider 
the problem of selecting the degree of the common divisor. (See the discussion in Section 1.1.) 
The measure of closeness between given and approximating TV-tuples of polynomials is chosen 
to be the weighted Euclidean semi-norm. (For a review of other possible closeness measures, see 
dPrnL bOOlh .-) 

1.1. Problem statement and related formulations 

Let P„ be the set of univariate complex (or real) polynomials of degree at most n > 0, i.e. 

F„ := {po+piz + ■ ■ -+p n z n | Pj E ¥} C W[z], (1) 

where F is C or R. As discussed in Section 2.2, P„ is isomorphic to the space of coefficients 
F" +1 . 

Let n = [m ■ ■ ■ it,n] t 6 N n be fixed degrees, n m in '■= minn^, and denote P n := 
P ni x ■ ■ • x P rlJV be the set of TV-tuples of polynomials with these degrees. We assume that P n 
is equipped with some distance dist(-, ■), which is continuous in the Euclidean topology. 

In this paper, we address the problem of finding the distance from a given TV-tuple to the set 
of TV-tuples that have a greatest common divisor (GCD) of degree at least d > 0: 

% := {(p^, . . . ,j$W) : deggcd^ 1 ), . . . ,pW) > d} . (2) 

Problem 1 (Approximate GCD with bounded degree). Given p = (p' 1 ', . . . ,p^ N ^) € P n and 
d : < d < ri m i n , find the distance 

dist (p,%) := min dist (p, p) . (3) 

The set % is closed|JJ (see Lemma 15 in A) in the Euclidean topology, and hence Problem 1 
is well-posed. Closely related to Problem 1 is the problem of finding the maximal GCD degree 
in the e-ball around the given TV-tuple. 

Problem 2 (e-GCD degree). Given p = (p (1) , . . . ,p (N) ) G P„ and e > 0, find 

d*(e)= max deggcdp subject to dist (p, p) < e. m 



As noted bv lEmiris et al . ( 1996); Rupprecht ( 1999), % form a descending chain of sets: 



Fn=?0^1^2>'O % lmm D &n min +l = 0- (5) 

Therefore, the solution of Problem 2 is the only integer d* — d* (e) that satisfies 

dist(p,%*)^ £ and dist(p,%»+i) > £• (6) 

Henc e, being able t o solve Problem 1 allows us to solve Problem 2, and vice versa (see also 
(IRupprechti 1 1 9991) and dMarkovskvil2012i §2.3)). Indeed, we can find d* satisfying (6) by solving 



Problem 1 for different d (for example, using bisection). Vice versa, by solving Problem 2 for 
varying e, we can detect a jump of d* (e) from d to d + 1, which will correspond to the solution 
of Problem 1 . 



1 lEmiris et alj U 996ft provide an exposition of algebraic-geometric properties of the sets ^ d in the case N 



Many authors were focused on finding maximal GCD degree within a given ball (see Prob- 
lem 2), however it was recognized that besides the maximal degree of interest is also the closest 
TV-tuple of polynomi als with GCD of th i s degree. More precisely, t he following problem formu- 
lation was adopted bv lRupprechil l[l999); Zen g and Davtonl (J2004 : lBini and Boitol d2010h . 
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Problem 3 (Approximate GCD with tolerance e). Given p 

(1) Find the degree d*{e) of the e-GCD (solve Problem 2). 

(2) Find the closest TV-tuple p* with the computed degree (solve Problem 1 with d 



d*). 



Usually, the candidate degree d* on the first step of Problem 3 is found by some heuristic. 
The heuristic can be based on finding the gap between the singular values of the Sylvester matrix 
( Corless et all 1995), tracking the las t singu lar value of the sequence of Sylvester sub-resultants 
(IRupprechtl Il999t IZeng and Davtonl 120041) or , more recently, look ing at the principal angles 
between the columns of sub-resultant matrices (IWinkler et al.l 120121) . 

The distance measures used are typically induced by a norm || • || in P n , where in most cases 
|| • || is a we ighted Euclidean norm. This choice can be justified from a statistical point of view 
Agrawal et al.l (120041) . or because it makes the problem more computationally tractable. In addi- 
tion, it can be shown that the Euclidean distance is equivalent to minimizing a weighted sum of 
squared sines of angles between the polynomials, therefore this approximation criterion depends 
only on the roots of the approximating polynomials (se e the discu s sion i n Section 4.4). Other 
norms were also considered in the literature, for example IHitz et alj (U.999) use £ x norm for the 
nearest singular polynomial problem (computing the approximate GCD of a polynomial and its 
derivatives). 

In this paper, we consider the distance defined by the weighted extended semi-norm on P n : 



dist(p,q) := ||p-q|| 
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where it/ fc ) e [0, +oo] nk+1 , w = (w 



(1) 



, w 



W)and 



Note that if Wj £ (0, oo), then 

J k ) _ 



3=0 



VjP jPj . 



(7) 



is a norm. 



If an infinite weight w) = oo is present, we formally assume • oo = in (7) and require the 



(k) 

distance to be finite in (3). Then, w) = oo is equivalent to the equality constraint p 



CO _ d k ) 



Pj 
(*0 



imposed on a separate coefficient (for example, a monicity constraint). A zero weight w 

makes the coefficient p)- ' irrelevant, and the resulting solution p does not depend on jr. ' . 
Therefore, we may assume that the coefficient is und efined (which correspond s to the case of 
missing observations in data modeling problems (M arkovskv and Usevichil2013l) ). 



1.2. Previous work on approximate GCD with bounded degree 

Problem 1 is non-convex and has no analytic solutio n. Therefore, except for recent papers 
dCheze et all l201lt iLi et all 120081; iKaltofen et ail 120081) . the problem is typically solved with 
local optimization methods, starting from an initial approximation. (In the methods for solving 
Problem 3, the initial approximation is computed on the first step of the e-GCD degree selection.) 



Most of the optimization methods for Problem 1 are based on two differ ent equivalent repre- 
sentations of the set % (mentioned in dEmiris et all 1 1996tlRupprechj.il 9991) ): (i) the set of prod- 
ucts of common divisors and cofactors and (ii) the set of vectors with rank-deficient Sylvester 
sub-resultant matrices. We will use the term image representation for (i) since % is represented 
as the image of a multiplication mapping, and kernel representation for (ii) since % is repre- 
sented as the kernel of the product of maximal minors of a structured matrix. 

In the kernel representation, the following set is used instead of %: 



{p € P„ : yP (p) is rank deficient}, 



(8) 



*>« 



where SP\ (p) is a Sylvester s ub-resultan t matri x (for N = 2) or a g eneralized Sylvester 



subresultant matrix proposed by iRupprechtl (11999b : iKaltofen et al.1 d2006l) (for N > 2). Note 
that for N > 2 the set defined in (8) includes extra polynomials which do not have a com- 
m on divisor (along with the polynomials from %). An alternative matrix structure, proposed 
bv lAgrawal et al.1 (120041) . overcomes this deficiency and handles properly the non-generic cases. 
(For more details, see the discussion in Section 5.2.) 

In optimization methods, the rank deficiency constraint is usually parametrized by a vector u 
in the right kernel of Sf\ (p), and the following problem is considered: 



minimize dist(p,p) subject to S*} (p)u = 0. 



(9) 



p,u^O 



For local optimization of (9), in a numb er of papers the constraint is replace d by a penalty, 
and a Gauss-Newton iteration is applied (IKaltofen et all l2006t iLi et all 120071) . This is known 
as structured total-least norm approach (STLN), which has complexity 0(n 2 ). Note that in the 
STLN approach the last coordinate of u is fixed to 1 (which corresponds to a monic polynomial), 
and some solutions of (9) correspond to ill-posed cases in STLN. Another appro ach to (9) is used 
in the recently proposed GPGCD method with complexity 0(n z ) (ITeruil. 120131) . 
In the image representation, the set §f n is replaced by the following set: 



^:={(.9 (1) V 



-,{N) 



h) : gU e 



- rii—di ■ 



,9 {N) & 



-d,h £ 



M{o}} 



(10) 



Note that for complex polynomials 



&d, but for real polynomials 



,^d only if d is 



even, and % = J^ U &d+\ if d is odd (see Lemma 14 in A). 

Problem 1 thus can be reduced to finding the distance from an A^-tuple p to the set J^, which 
is the problem of minimization of the function 



/( ff ( 1 ),..., 5 W,/ l ):=dist(p,( 5 ( 1 )/ l ,..., ff W/ l )). 



(ID 



This is a nonlinear least-squares proble m. A prelim i nary a nalysis of problem (11) and different 
optim ization strategies was performed bv lChin et al. (11998b. In the method UVGCD (IZeng and Daytonl 
20041) implemented in the package ApaTools dZena, 120081). the cost fu nction is minimized us- 
ing the Gauss-Newton method. In the Fastgcd method (Bin i and Bokol 120101) . the authors use a 
combination of the Gauss-Newton method and a line search method. 

A large class of methods i s based on the variable projection principle (going back to the work 
of iGolub and Pereyral (119731) ). The variable projection principle is based on the fact that finding 
the minimum of / for a given h 



f{h) = 



fig 



(i) 



9 {N \h) 



(12) 



,9 



(N) 



is a linear least squares problem and has a closed-form solution. Thus the variables g^ , 
are eliminated and the search space is reduced from 0(n) to d + 1 optimization variables, 
which is beneficial if d is small compared to n. Variable projection can also be applied if d is 
large and the cofactors have small dimensions (in th is case, the common divisor h is eliminated 
dStoica and Soderstr dm l Il997t IXgrawal et all 12004 ). 

ICorless et alj (11995I) showed that the function f(h) can be evaluated in 0(dn) flops for 
uniform weights (unweighted Euclidean norm), and a preliminary analysis of the accurac y of 
the co mputational procedure was presented (Section 3.3 contains more extensive an alysis). IPanl 



2001) showed that for general weights, the complexity is 0(dn log n). Independentlv. lMarkovskv and Van Huffel 
(2006) showed that not only the cost function, but also its gradient can be computed in O(dn) 



flops for uniform weights. Unfortunately, there is no publicly available package for N polyno- 
mials and non-uniform weights. 



The variable projec tion principle is a l so use d in the classical paper of Kar markar and Lakshman 
( 1 1998b (and also in JHitz and Kaltofenl[l998h ). where the authors show that for d — 1 ih(x) = 
x — a) the function f(h) has a simple expr ession thr o ugh a and the coefficient s of the given 
polynomials. This approach is extended by IZhi et alj (120041) ; lli and Zhil 02013b for symbolic 
computation of nearest singular polynomial (approximate GCD of a polynomial and its deriva- 
tives), where h has the form (x — a) d . 

The variable projection wass also used in recent methods of global optimization for Problem 1 . 
(Note that all the methods reviewed above are local optimization m ethods.) The global optimiza- 



tion methods include subdivision methods of 



programming relaxations of iLi et al. (2008); Kaltofen et al.1 (2008). 



Cheze et alj (2011) (for d = 1) and semi-definite 



1.3. Contribution of this paper 

In this paper, we show how the nonlinear least squares problem (minimization of (11)) is 
related to the weighted mosaic Hankel low-rank ap proximation. Using recent res ults on the com- 
plexity of mosaic Hankel low-rank approximation (lUsevich and Markovskv[|2013b . we show that 
the cost function (12), its first derivatives and an approximation of the Hessian c an be evalu ated 
in 0{d 2 n) flops, which is an improvement of th e results of .C orless e t al.l (1 1 9951) ; IPanl ( 1200 II) for 
d <C n. We also provide more detailed than in (ICorless et al.l Il995l) coverage on the accuracy 
of the computational process. We also show that the same r esults apply to the ca se of variable 
projection with respect to elimination of the common factor JAgrawal et all 120041) . In this case, 



,,v 



the computational complexity per iteration is 0((J\ y _-, (rik — d))' 2 n), which is linear for d with 
(rife — d) <C n. 

Next, we show that the variable projection can be also applied in the kernel representation 
(for Sylvester low -rank approximation), by eliminating p from (9). For N = 2 polynomials, the 
complexity of the evaluation of the cost function, its first- and second- order derivatives is also 
linear in n, if (rife — d) -C n. However, for N > 2 polynomials, the complexity is not linear in n 
even if (nt, — d) <C n, and there are some intrinsic computational issues. 

This paper is composed as follows. Section 2 contains necessary background, including com- 
monly used notation, details on the space P„ and common divisors, and an overview of least- 
squares and least-norm problems. In Section 3, we consider a class of least-squares and least- 
norm problems with matrix-polynomial multiplication matrices. We show their correspondence 
to mosaic-Hankel low -rank approximation, and provide a summary of results on complexity and 
accuracy of the evaluation of the cost function, gradient, and a Gauss-Newton approximation of 
the Hessian. In Sections 4 and 5, we show that the variable projection principle applied both 



to optimization over common divisors/cofactors and Sylvester low-rank approximation leads to 
least-squares and least-norm problems with matrix-polynomial multiplication matrix. 

In Section 4, we consider optimization in the image representation (11) and apply the variable 
projection principle either to common divisor, or to the cofactors. We also show that the same 
technique can be applied in the complex case. In Section 4.4, we show how minimization of the 
norm (7) is connected to minimization of the angles. In Section 5, we consider optimization in 
the kernel representation. In Section 5.1, w e compare generalized Sylvester subresultant matri- 
ces (Rupprechtlll999tlKaltofen et aUl200q) and alternative structured matrices of lAgrawal et al. 



(2004). In Section 5.2, we show how low-rank approximation of generalized Sylvester subresul- 



tant matrices can be reduced to mosaic Hankel low -rank approximation. In Section 5.4, we show 
how an initial approximation is obtained for local optimization methods. In Section 6, we pro- 



vide n umerical experiments that include comparison with the methods Fastgcd (Bi ni and Boito , 
2010b and UVGCD dZeng and Davtonll2004 implemented in MATLAB. 



The methods developed in this paper are implemented in MATLAB and are based on the 



SLRA package (SLR, 120131) described in dMarkovskv and Usev ich. 2012). The source code of 



the methods and experiments is publicly available and is embedded in the extended version of the 
paper using literate programming. The methods are currently implemented only for real-valued 
polynomials. (For complex polynomials only theoretical results are presented.) 

2. Preliminaries 

2.1. Matrix and vector notation 

In the paper, we adopt the following notation for matrices and vectors. 

F m xn — the space of m x n matrices with elements in a field F (where F is R or C ); 

Omxn, 0„ — m x n matrix of zeros, vector of n zeros; 

l mxn , l n — m x n matrix of ones, vector of n ones; 

/„ — n x n identity matrix; 

J n — n x n matrix with 1„ on the main anti-diagonal and all other elements equal to zero; 

diag(v) — the diagonal matrix constructed with the elements of v on the main diagonal; 

blkdiag(^4i , . . . , An) — the block-diagonal matrix composed of matrices A\ , . . . , Am\ 

Col(pi, . . . ,pn) — concatenation of vectors [pj ■ ■ ■ pJf] T . 
For a vector v = [v\ ■ ■ ■ vn] t £ K N and a scalar b £ K we define the following operations: 

v + b:= [vi + b ••• v N + b] T ; 

T,(v) — sum of the elements in the vector v; 

v^ 1 = [ i\ l ■ ■ ■ fjy 1 ] T — sum of the elements in the vector v; 

v < b <*=*> v k < b for all k £ {\, . . . , N}. 
We also summarize some commonly used symbols: 

n = [ni • • • «jv ] T — polynomial degrees; 

P = (p i ■ ■ ■ iP^) GPn — given A^-tuple of polynomials; 

d — common divisor degree; 

h — tentative common divisor; 

P = (P ) ■ ■ ■ iP^) GPn — approximating iV-tuple; 

g = (.g*- 1 ', . . . , g^ N ') £ P n -d — tentative cofactors. 



2.2. The set of polynomials with bounded degree 

The space P„ defined in (1) is isomorphic to F n+1 through the correspondence 



p(z)=p +piz + ---+p n z n eP„e P = [pa ••• p„] T eF" +1 . 

Note that we use the term "degree" for n (it may be different from the polynomial degree, since 
the leading coefficients are allowed to be zero). 

Remark 4. P„ \ {0} can be viewed as the space of univariate polynomials of degree d with the 
roots on the Riemannian sphere C U {oo} (where the number of zero leading coefficients is the 
multiplicity of the root oo), or more formally the homogeneous bivariate polynomials p(y , z) of 
degree n. 

The standard multiplication of polynomials (pW -p^)(z) ~p^(z)p^- 2 \z) (acting as P ni x 
P„ 2 —} ¥ ni+n2 ) has the following matrix representation: 

p« .p< 2 > =M„ 2 ( P W) P ( 2 ) =M ni (p^) P {1 \ 

where M TO (/i) is the multiplication matrix 

ho 



M m (h) 



h 



h d 



jp(m+d+l)x(m+l) 



"multmat . m" 7= 



function M = multmat (h, m) 

M = toeplitz ( [h (:); zeros (m, 1) ] 
endo 



[h(l); zeros (m, 1) ] ) 



For < d < n, we say that a polynomial h E Pd\ {0} divides a polynomial p E F n (or h 
is a divisor of p), denoted by h | p, if there exists a polynomial q e P n -d such that p = q ■ h. In 
particular, this definition includes the following special cases: 

• All polynomials /i<GP,i\{0},0<<i<7i,, are the divisors of a zero polynomial € P„. 

• A nonzero polynomial of zero degree h € Pq \ {0} is a divisor of any polynomial. 



2.3. N -tuples and greatest common divisors 

We also introduce the operation of multiplication of an TV-tuple g = (gW 
by a polynomial h € P^ as follows: 

g-h:=(g^-h,.. 

The polynomials p = (p^ 1 



9 {N) ) G P»_ 



,9 {N) -h). 



,P 



(N)) 



have a common divisor 



of degree d, if there exists h <G Pd \ {0} that divides all the polynomials p^ k \ or equivalently 
p = g • h for some g G P n -d- 



We also say that h <G P<j \ {0} is a greatest common divisor (GCD) if there is no common 
divisors in ¥# for any d! > d. Since 1 e Po is a divisor of any polynomial, a GCD always exists 
(but it is not unique). We denote by deg gcd p the degree of GCDs, and denote by gcd p the set 
of all GCD (which is a subset of Pdeggcd p)- In particular, if all p^\ . . . ,p( N > are zero, then 

gcdp = P„ mi „\{0}. 
Otherwise, gcd p is a punctured one-dimensional linear subspace of Pdeggcd p 
gcdp = {ah : aeF\{0}, h is a GCD of p {1 \ . . . ,p {N) }. 
2.4. Least-squares and least-norm problems 

In this paper, we use the duality between least-squares and least-norm problems. Next, we 
give an overview of these problems. 

Problem 5 (Weighted least-squares problem). Let A e M mx ", m > n, b e R m , and v € 

[0, oo) m , such that rankdiag(y / w)^4 = n 

minimize \\Ax — 6IL 2 ,. (13) 

The problem in (13) is the orthogonal projection of b on the image of A in the seminorm 
|| • \\ v . The solution can be easily seen if we rewrite the cost function in (13) as Hjt 1 ^!!!. where 
yi L S) _ di&g(y /r v)(Ax — b). Therefore, the solution of (13) is the following 

x* = (A T dia,g(v)A)~ 1 A T diag(v)b, 

y i LS ^ = diag(V^) (A(A T diag(^)yl)- 1 A T diag(w) - 7) b, 

M^ LS) = \\yi LS) \\ 2 2 = \\b\\ 2 v -b T dmg(v)A(A T dmg(v) A)- 1 A T di&g(v)b. 

The least-squares problem (13) is closely connected to the following dual problem: 

Problem 6 (Weighted least-norm problem). Let A e R mxn ,m >n,ce R m , andtu e (0,oo] m , 
such that rank diag ( V w ~ 1 ) A = n 

minimize lie— z\\t, subject to A T z = 0. (14) 

The problem (14) is to find the orthogonal projection of the vecto r c on the kernel of the matrix 
A in the norm || • \\ w . Changing variables as c — z — dia,g(\ / w~ 1 )y( LN ^ The cost function can 
be rewritten as H?/ 1 ^!!^ an d the constraint as A T dia,g(\ / w~ 1 )y^ LN ^ = A T c. Therefore, the 
solution of (14) is the following 

z* = c - dmg(w~ 1 )A(A T di&g(w~ 1 )A)~ 1 A T c, 

y{ Lir > = ding(\ / ^F T )A(A T diag(u;- 1 )A)- 1 A T c, (15) 

Mi LN) = \\y[ LN) \\l - cJA(A v dmg(w- 1 )A)- 1 A v c. 

One can see that the expressions for the solutions of (13) and (14) have a similar form. (Note the 
inversion of the weights.) In particular, if we have c = d\a,g(w^ 1 )b and v = w _1 , then 

Mi^ = \\b\\l-Ml LN \ 

(LS) ,. i /-s, (LN) {lb > 

yi = dmg(y/v)b - yl '. 



3. Least-squares and least-norm problems with parameters and multiplication matrices 



In this section, we consider the least-squares and least-norm problems for matrices depending 
on a parameter A — A(P). As we will show in Sections 4 and 5, application of the variable 
projection principle both in optimization over common divisors/cofactors (11) and Sylvester low- 
rank approximation (9) leads to minimization over P of the least-squares solution Mi (P) or 
the least-norm solution Mi ' (P). 

We show how the least-norm problem is connected to mosaic Hankel low-rank approxima- 
tion, which allows to use the efficient algorithms for evaluati ng Mj (P), its gradient an d 



a Gauss-Newton approximation of the Hess ian developed in dUsevich and Markovskvi 120131) . 
The software implementation is described in ( Markovskv and Usevichi 120121) . We also provide a 
summary of known results on the accuracy of computations. 

3.1. Mosaic Hankel matrices and least-squares problems with multiplication matrices 

Given a matrix P G R kxt , integer vector k = [ki ■ ■ ■ k^} T € N K , such that S(k) = k, and 
a number I > 1, we define the matrix-polynomial multiplication matrix: 



A*AP) 



M^pd'D) ••• Mj-iCPC 1 -*)) 



where P {l -^ e 



M^tP^D) ... M,_i(P(**)) 
i ki x 1 are the following sub-matrices of the matrix P: 
'p(iA) ... p(i,t)' 



(17) 



P = 



p(9>!) . . . p(<?>*) 



For two integer vectors k and 1 = [ l\ ■ ■ ■ /l ] T , I = S(p), we define .4k,i(P) as 

An,„(P) := blkdiag(^ Ml (P), . . .,A k ,i K (P)) ■ 
Let J$?k,i(c) £ R fcx( denote a Hankel matrix, generated from the vector c € R fc+ 



2-1 



, i.e. 



M,i{c) 



Ci C 2 • • • Q 

c 2 •'' ."' : 



Cfe 



Ck+l-l 



For two vectors k and 1 and a vector c = col(c^ 1,1 - ) , . . . ,c^ q,1 \ . . . ,c^ 1 ' jv - > , . . . , c^'^ ) with 
C (M) £ ^m k +ni-i we (jgfjjjg the mosaic Hankel matrix: (lUsevich and Markovskyl, 120131) 



^k,i(p) := 



^k K ,iM K ' x) )---^K,i L {c {K ' L) ) 



Then the following lemma holds true (see also dUsevich and Markovskvi I2013J Sec. 3)). 

Lemma 7. 

Al,i(P)c = <=> P T ^k,i(c) = 0. (18) 

By Lemma 7 and the results of Section 2.4, we have that the following problem 

m i n ll c -c£ subjectto P T J^,i(c) = (19) 

C 

is a least-norm problem (14) with A — A^\(P), and its solution (15) is given by 

Mi LN) {P) = \\yi LN \P)\\ 2 2 = S T (P)T- 1 (P)s(P), (20) 

yi LN \p) = diag(\4F T ).A k ,i(P)r- 1 (P) S (P)., (21) 

where 

T(P) :=^ l(p)diag(w -i )Ai(p)5 

s(P) := Au( p ) c - 
The solution of the following problem 

Mi LS) {P) = min \\A k ,i(P)x ~ b \\l ( 23 ) 

can be expressed using the correspondence (16), with c = y/vb and w = w" 1 . 

3.2. Computations and their complexity 



(22) 



In dUsevich and Markovskvi 120 1 3c iMarkovskv and UsevichL 120 12b . the following low-rank 



approximation problem is considered: given c, w, k and 1 

minimize \\c— c]|^ subjectto rankJ^ki(c) < k — t. (24) 

c 

By using the kernel representation of the rank constraint, 

rank^i(c) <k-t ■<=> P J J£k,i(e) = 0, for some P E R kxt with rankP = t, 
the problem (24) can be rewritten in an equivalent form (using Lemma 7) 

minimize Mi LN) (P), (25) 

Pen k , t 

where Hkt ^ R fcxt is the set representing all t-dimensional subspaces of M. k (the Grassmann 



manifold). See (Mar kovsky and UsevichL 120121) for the parametrizations used in the software 
implementation. 

The efficient algorithms of lUsevich and Markovskvi (120131) for local optimization of (25) are 
based on the fact that for fc, <C lj the matrix T(P) is 2/i<i-banded, where /i = max^ ki. Therefore, 
the Cholesky factor of T(P) is /xd-banded, and the term T~ 1 (P)s(P) i n the cost function (20) 
can b e computed efficiently using the Cholesky factorization of T(P). In (lUsevich and Markovskvi 



2013b has been shown that the cost function of (19), its gradient and a Gauss-Newton approxi- 
mation of the Hessian (2J7 T J, where J is the Jacobian of (21)) can all be evaluated in 0(t 3 kl) 
flops. Also the complexity of the cost function and gradient evaluation is 0(t 3 kl) when the 
weights are uniform across the subvectors c' IJ ) of the vector c. 

As we have shown in Section 3.1, the cost function in (23) has the same form as the cost 
function in (19). Note that in the special case of problem (23), when t = 1 and K = 1 and 
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the nor m is Euclidean , our c omplexity results for the cost function evaluation coincide with the 
ones of lCorless et al.l (11995b . For the case of nontrivial weights (and also t = 1 a nd K = 1) , 



the complexity results for the cost function were proposed to be 0(l 2 log(l)) by iPanl d200ll) . 
Therefore, our results improve over the results of D?anl (EOOlb . For computation of the gradi- 
ent, the complexity r esults for t = 1 and K = 1 and the Euclidean norm was presented by 
Markovsky and Van H uffel (2006). The algorithms a nd their complexity for c omputation of an 
approximation of the Hessian was first considered bv lUsevich and Markovskvl d20 13b . However, 
an approximation of the Hessian a llows to use more robust and efficient methods (for example, 
in (IMarkovsky and Usevichll2012b the Levenberg-Marquardt method is used for minimization of 

yi LN \P)). 

Note that the cost function and its derivatives in the problem 



minimize M. 

P£K k , t 



(LS) 



(P) 



= \\yi LS) \ 



(26) 



coincide (up to a li near transformation) with the co st function and its derivatives for the problem 
(25). The software (IMarkovsky and Usevichll2012b allows to perform local optimization for both 
problems (25) and (26). 

3.3. Accuracy of the computations 

The key computational procedure in evaluation of the cost function and its derivatives is the 
solution of the system of equations Tu = v using the Cholesky factorization. In what follows we 
consider the case of th e Euclidean norm (which can also be generalized to the case of block-wise 
weights dUsevich and Markovskvl 12013 )). In this case, 



r(p) = bikdiag (r (h) (p), . . .,t {1l \p)\ 



-.(0, 



where T^ (P) :— A^ l (P)Ak,i(P). Therefore, we can solve the system Tu — v separately, 
block by block. 

The accuracy of solving the system of equati ons T^(P)u = v with Cho lesk y factorization 
mainl y depends on the condition number of T^ dGolub and Van Loanlll996b . In dCorless et al. , 
1995b only a preliminary analysis of the conditioning of T^ G R ltxlt was performed (for the 
case t = 1 and K = 1). 

Let us introduce the matrix polynomial 



P(z) 



p(i,i)( z ) ... p(M)( z ) 



P(*.l)(*) ... P&.*){z) 



where P^ l ^\z) are the generating funct ions of the vectors Pw), Then T^ 1 ' is block-Toeplitz 

(27) 



ig runctio 

with the symbol (iMiranda and Tillii l2000l) 

F(*) =P T (z- 1 )P{z). 

Since F(z) is Hermitian for all z and F is continuous on the unit circle T, we can define 



ap := min X r 

T 



(F(z)) > 0, 6i 



maxA r 



(F(z)) < oo. 



Where A m ,„.(-B) and A mnl (fi) are the minimal and maximal eigenvalues of a matrix B. The 



esults of (Mir anda and Tillii l2000l) imply that L.-Jr^l \ a F and A max (F^)) /■ b 
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i.e. the eigenvalues are in the interval [of ; bp] and converge to the endpoints as I — > oo. Therefore, 
the condition number «2(r^) := A moa; (r , ^)/A m »n(r^) behaves as 

* 2 (r«)/^. 

If F(z) is positive definite on T, then «a(r®) < bp/ap < oo. Otherwise, « 2(r^) — > o o, and 
the rate of the convergence depends on the order of the zeros of (F(z)) on T (lSerraUl998l) . 



4. Optimization in the image representation 

In this section, we consider the problem of finding the distance to the set &&, defined in (10). 
As noted in Section 1.2, this corresponds to solving Problem 1 using the image representation. 

Problem 8 (Approximate common divisor). Given p, find 

min \\g-h-p\\l. (28) 

geP n _ d , 

h&¥ d 

Note that we can include the zero polynomial h in the search space of (28) (compared to the 
definition in (10)), since the zero polynomial is necessarily in J^. 

We see that the problem is a nonlinear least squares problem in both h and g. However, if we 
fix one of the variables, then the problem becomes linear least squares. Next, we show that in 
both cases problem (28) can be reduced to minimization of Mi (P) for suitably chosen k, 1 
and t. 

4.1. Variable projection with respect to hfor real polynomials 

In the case of variable projection with respect to h for real polynomials, the problem (8) is 
rewritten as the following double minimization problem 

minimize fi(h), where (29) 

her d \{0} 

fi(h) := minimum of (28) for fixed h. (30) 

Hence, 



f 1 (h) = Mi LS) (h) for k 



1 



1 = n — d and 



Note that / is invariant to multiplication by a scalar, i.e. fi(ah) — fi(h) for any a ^ 0, 
since the columns of A-k.\(h) and Ak. t i(ah) span the same subspace. Therefore, (29) is a min- 
imization on the projective space (which is a special case of the Grassmann manifold). Thus, 
the problem (29) is equivalent to the problem (26) and can be solved by the software package 



jMarkovskv and Usevichll2012l) . 

Thus we have reduced the number of optimization variables from E(n — d + 1) + d to d, 
which is beneficial if d <?C n. Indeed, as shown in Section 3.2, the cost function and its first- and 
second-order derivatives can be evaluated in linear time in n. 
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{ GCD common parameters 13a ) = 

if "exist (' opt ' , 'var') 

opt = struct () ; 
end 

p = reshape (p, length (p) , 1); 
bfp = cell2mat (p) ; 
bfn = cellf un (Slength, p) - 1 ; 
bfell = bfn - d;o 

Fragment referenced in 14a, 15, 19b, 20, 21a. 



( Calculate opt.hini if it is empty 13b ) = 

if ( " isf ield (opt , 'hini') | | isempty (opt . hini ) ) 
( Calculate opt.gini if it is empty 14b } 

opt.hini = lsdivmult (p, d, opt.gini); 
endo 

Fragment referenced in 14a. 



(Split ph into separate polynomials 13c) = 

ph = mat2cell(ph, bfn+1, [l]);o 

Fragment referenced in 14a, 15, 19b. 



( Optimize NLS ph with SLRA solver 13d } = 

s . gcd = 1; 

if isempty (w) 

[ph, info] = slra (bfp, s, sum (s . m) -1, opt); 
else 

bfw = cell2mat (w) ; 

s . w = 1 . /bfw; 

[ph, info] = slra (bfw. *bfp, s, sum (s .m) -1, opt); 
endo 

Fragment referenced in 14a, 15. 
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"gcd_nls . m" 14a= 

function [ph, info] = gcd_nls (p, w, d, opt) 

{ GCD common parameters 13a ) 

{ Calculate opt.hini if it is empty 13b ) 

opt.Rini = opt.hini'; 

s = struct ('m', d+1, ' n' , bfn-d+1); 

{ Optimize NLS ph with SLRA solver 13d ) 

{ Split ph into separate polynomials 13c ) 
endo 



From the results of Section 3.3, we have that the accuracy of the computations are determined by 
the properties of the function F(z), which in this case has the form F(z) = |/i(z)||. Therefore, 

the computations become ill-conditioned if the tentative common divisor has roots on or close to 

the complex unit circle T. Note that this is similar to the results on conditioning of the Sylvester 



matrix (IZeng and Davtoiu 12004 Prop. 6) 



4.2. Variable projection with respect to g( k > for real polynomials 

In the case of variable projection with respect to g^ for real polynomials, the problem (8) is 
rewritten as the following double minimization problem 

minimize /2(g), where (31) 

geP n -d\{o} 

H (§) := minimum of (28) for fixed g. (32) 



Then the cost function has the form 



_ aA ls ) 



/ 2 (g)=Mr j (g) for k = n-d, 1 



d+1 



and v = w. 



This type of variable projection was used in (IStoica and Soderstromlll997tlAgrawal et all 120041) 



(under the name of Common Factor Estimation). 

We also have that /^(cng) = /2(g) for a/0, since the columns of ,4k. i(g) and -4k.i(ag) 

span the same subspace. Therefore, problem (31) is also an instance of problem (26). 
( Calculate opt.gini if it is empty 14b ) = 

if ( "isf ield (opt, 'gini') | | isempty (opt . gini) ) 

opt.gini = g_ini (p, d) ; 
endo 

Fragment referenced in 13b, 15. 
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"gcd_cofe.m" 15= 

function [ph, info] = gcd_cof e (p, w, d, opt) 

{ GCD common parameters 13a ) 

{ Calculate opt.gini if it is empty 14b ) 

opt.Rini = opt.gini'; 

s = struct ('m', bfn-d+1, 'n', d+1); 

{ Optimize NLS ph with SLRA solver 13d ) 

{ Split ph into separate polynomials 13c ) 
endo 



The number of optimization variables is reduced from S(n — d + 1) + d to S(n — d+1), 
which is beneficial if (nk — d) <C n (the deg ree of the common div i sor is high). This is the 
case, for example, in deconvolution problems (St oica and Soderstromi 119971) . In this case, the 
computational complexity per iteration is also linear in n. 

The symbol (27) in this case equals to F(z) = 2fc=i Is^'H 2 )! 2 ' therefore the computations 
are ill-conditioned if the polynomials g^ (z) have common roots on the unit circle T. Note that 
this is less likely to occur in practice than h having roots on the unit circle. 

4.3. Variable projection with respect to hfor complex polynomials 

Now we assume that the polynomials in (28) are complex. Then they can be represented by 
the vectors 



p(*) = p(*.*) + i.p(^.fc) ) h 



h x + i-h\ 



jW=gW) +i .gM) } 



where i 



— 1. If we set 

pl^colfpW'pW',...,^,^ 
g W = co i U*,i 



1) „(•*,!) 



N) 



(®,N) JJ?,N) 



w y 



col I 



-h J 
Then we have that the problem (28) is equivalent to 



«,W W W 



p(h) 



G 



j^(d+l)x2 



minllA.iC^^pW-gWll^,, 



(33) 



fork = 

as 



d+ld+1 



1 = n — d and v = w. The problem (33) can be equivalently rewritten 

(34) 
/ 3 (PW) := minimum of (33) for fixed P (h) = Afi LS) (P (,l) ). (35) 



minimize fz{P ), where 

^ePd\{o} 



It can be seen that fz(P^) — f3,{P^ ah ^) for any aeC\ {0}. Therefore, fy can be minimized 
on the complex projective plane. The cost function can be evaluated in the software package 



15 



(IMarkovskv and Usevichl 120121) . but minimization over the projective complex plane is not yet 
implemented. 

However, something can be said about conditioning of T. We have that for P = PW 



F(z) 



h m {z~ x ) k*{z- 1 ) 



h m {z) -h y {z) 
h<*( z ) h m {z) 



and 



detF(z) = \h m {z)\ A + \h-*(z)f + 2# {h m (z~ x )bf (z)\ 



onT. 



Therefore, the T is ill-conditioned if the polynomials h (z) and h (z) have common roots on 
the unit circle T. 

4.4. Angles between polynomials as an approximation criterion 

In this section, we consider the problem for finding the distance to the set && (10), in the 
following distance based on angles between vectors 



JY 



dist(p,q) = ^sm 2 (Z(p( fe W fc ))). 



(36) 



fc=i 



(For simplicity we consider only the case of real polynomials.) The distance (36) depends only 
on the roots of the polynomials. By applying the variable projection principle with respect to h, 
the problem becomes minimization of fi(h), where 

N N 

/ 4 (ft):=min^sin 2 (Z(pW,ft-^ fc ))) = ^rmnsrn 2 (Z(pW,M nii _ d _ 1 (^ fc ))). 



s fe=l 



fc=l 9" 



It is well known (Kendal l and Stuartl 1 19771 §17.26), that the least-squares solution minimizes 



(k) 



the angle between the approximating vector and the given vector. Let g* ' be the solution of 
a least squares problem with matrix Au = M, H ._d_i(/i) and right-hand side b = p( k \ Then 



(jj(*0 _ A k gi ')_LAkgi and we have that 



iV 



,1 a ^( k ) „(k)\\2 
f ft\ Y^ \\ A k9* ~P y '\\2 



fe=i 



lb (fe) lli 



"\\ p W-h-gM\\l 

8 fc=l 



:| P (*)||! 



which is an instance of the weighted 2-norm (7). Therefore, minimiz ing the relative distance 
(which is done by many authors, see for example (Bin i andBoitoll2010l) ) is equivalent to mini- 
mizing the distance (36) based on angles. 



5. Optimization in the kernel representation 

5.1. Kernel representation: an overview 

First we will present a reformulation of Problem 1 using Sylvester identity and its general- 
izations. The following lem ma is well-known in the computer algebra community (Rupprecht, 
1999c iKaltofen et all 120061) . We present it in adjusted form (suitable for our definition of P„, 
which allows zero polynomials). 



16 



Lemma 9 ( dKaltofen et all J20061 Lemma 2.1, adjusted)). Let p G P„, u (1) € P ni -d \ {0} and 
m (2) € P„ 2 _d, . . ., u (jv:) G f nN -d such that 

u (k) p (i) _ u W p (k) = 0j where u {k) G P„ fc , Vfc = 2, . . . , N. (u k ) 

Then the polynomials p^ 1 ' , . . . , p' ' have a common divisor of degree at least d. 



Proof. The proof is given in A. □ 

Note that the condition (it/.) is equivalent to S*} (p)u = 0, where 5?\ is the following 
(generalized Sylvester subresultant) matrix: 



^ (1) (P) := 



G R KxL . (37) 



M ni _ d (p( 2 )) -M n2 _ d (pW) 

: '•• 

_M ril _ d (pW) -M nN _ d (pM) 

In the literature it is assumed in (IRupprechn Il999t Kaltofen et all |2006) that we can replace 



Problem 1 with the structured low-rank approximation of S^\ . However, the rank deficiency of 
o5^ ' does not imply in general that deg gcd p > d, because u^ may be zero in t he right kernel 



of S^d jp) (which is required in Lemma 9). A better alternative was proposed bv lAgrawal et al 
d2004 

^ (1) (P) 
^f(p), 



-y d (p) 



, if N > 2, 



if N = 2. 
Here we show that the low-rank approximation of =5^(p) is indeed equivalent to Problem 1. 

Proposition 10. Homogeneous polynomials p = (p^ 1 **, . . . ,p^ N ^) G P n have deggcdp > d if 
and only if J^d(p) is rank-deficient 



Proof. The proof is given in A. □ 



Therefore, Problem 1 is equivalent to the following structured low-rank approximation prob- 
lem. 



Problem 11 (Sylvester low-rank approximation problem). 

minimize ||p — P*ll w subject to ,5^ (p) is rank deficient. 



(38) 



However, in what follows we consider the Sylvester subresultant matrices =5^ (p). 

5.2. Sylvester low-rank approximation as a mosaic Hankel low-rank approximation 

In this subsection, we reduce the low-rank approxima tion of .5^ d (p) to a form that can be 
solved by the methods of (IMarkovskv and Usevichi 120121) . 
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iV 



L 6 :=^ 2 + l + > (4 + i + m) 



fc=3 



A' 



K b := yW+4 + 1), 



fe=2 



v(!) 



'/' '"' ■ = col(0 Kb -ni-i, -P^^Kt-m-i) 



g ( 2 ):=col(0^,p( 2 \0^,p^ 



,04,^,0,, 



( Sylvester GCD parameters 1 8a ) = 

Lb = bfell(2) + 1 + sum (bf ell (3: end) + 1 + bfn(l)); 
Kb = sum(bfn(2:end) + (bfell(l) +1)); 
ql = [zeros (Lb-1, 1) ; -p{l}; zeros (Lb-1, 1) ] ; 
q2 = [zeros (bfell (1) , 1) ] ; 
for k=2 : length (p) 

q2 = [q2; p { k } ; zeros (bfell ( 1 ), 1 )] ; 
endo 

Fragment referenced in 18b, 19b. 



Proposition 12. The matrix generalized Sylvester subresultant matrix can be represented as the 
following mosaic Hankel matrix: 



(^j 1) ( P )) T = $JC,^ 



$ := Jblkdiag ( 



(i) 



7 (2) 



h, 



where m := 



, + l)xd 



Ie 3 +i 0(£ 3+ i) X d 



and 



■ I 



(39) 



, -<£ 2 +£l+2 



Proof. The proof is given in A. □ 

{ Create Sylvester structure 18b ) = 

{ Sylvester GCD parameters 1 8a ) 

s.m = [Lb; bfell (1) +1] ; 

s.n = [Kb] ; 

phiels = cell (length (p) -1) ; 

phiels{end} = eye (bfell (1) +bfell (2) +2) ; 

for k=3 : length (p) 

phiels{length(p) +l-k} = [eye (bfell (k) +1) zeros (bfell (k) +1, d) ] ; 
end 
s.phi = f liplr (blkdiag (phiels {:})); O 

Fragment referenced in 19b, 20. 
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Hence, the problem (38) can be solved as a weighted mosaic low-rank approximation of the 
matrix in (39), if we fix the zero elements. This can be accomplished by putting infinite weights 
in the approximation criterion. 

w = col(col nb ^ ni ,w il \col nb - ni ,col£ 1 ,w^ 2 \col£ 1 ,w^ 3) , . . . ,ool£ 1 ,w < ^ N ',oole 1 ). 
( Create weights for Sylvester structure 19a } = 

if ("exist (w, 'var') | | isempty (w) ) 

w = cellf un (@ (x) {ones (size (x) )} , p) ; 
end 

wl = [Inf * ones (Kb-bfn (1) -1, 1) ; w{l}; Inf * ones (Kb-bfn (1) -1, 1) ] ; 
w2 = [Inf * ones (bfell (1) , 1) ] ; 
for k=2 : length (bfn) 

w2 = [w2; w{k}; Inf * ones (bfell ( 1 ), 1 )] ; 
end 
s . w = [wl ; w2] ; o 

Fragment referenced in 19b. 



Therefore, the problem (38) is equivalent to the following problem 

minimize /Liv($ T u), 

u 

where Jln is defined in (19). The minimization of /lat($ t u) can be performed by the SLRA 
package dSLRl 120131: iMarkovskv and Usevichl 1201 2|) . 



"gcd_syl.m" 19bs 

function [ph, info] = gcd_syl (p, w, d, opt) 

( GCD common parameters 13a ) 

( Sylvester GCD parameters 18a ) 

( Create Sylvester structure 18b ) 

( Create weights for Sylvester structure 19a } 

opt . reggamma = 10e4; 

[ph, info] = slra ( [ql; q2] , s, size ( s . phi, 1 ) -1 , opt); 

ph = ph (s . w~ = Inf ) ; 

{ Split ph into separate polynomials 13c ) 

ph{l} = -ph{l}; 
endo 



5.3. Computational complexity 

The complexity of the computations using the variable projection principle is 0(i 2 nb). In 
terms of degrees of the polynomials, it is 0((n — d) 2 n) if N — 2. Therefore for 2 polynomials 
this approach is beneficial if (rik — d) <C n. In the general case N > 2, the complexity is 0(n 3 ), 
even if (n^ -d)«n. 
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5.4. Initial approximation for the developed methods 

The formulation (38) also has advantages that it yields and initial approximation which can 
be obtained from the following problem: 

minimize d v y y ^' . (40) 

u u'u 

The solution of (40) can be obtained from the unstructured low-rank approximation of the ma- 
trix <5*d(p) (for example, taking the last vector in the SVD, or using the QR decomposition). 
The vector Ulra obtained from the unstructured low-rank approximation can serve as an initial 
approximation for u in (38). 

On the other hand, ulra can be used as an initial approximation for the problem (31), as 
shown by the following lemma. 

Lemma 13. If deggcd(p) = d and .S^div) — 0, then the polynomials uS ' are the factors of 
p( k ) from (uk), ie. p^ = u^h, where h e gcd(p). 



Proof. The proof is given in A. □ 



Since point with deggcd(p) = d represent generic points in Sf„, we may assume that an 
approximation Ulra of the "true" u is an approximation of the cofactors.In order to obtain an 
initial approximation for the problem (29), we can solve a least-squares problem, which yields 
and initial guess for h. This can be summarized in the following two-step algorithm: The follow- 
ing algorithm is used to obtain initial approximation: 

• Compute vllra — last right singular vector of <5^(p) (or y} ' (p)); 

~ N ^ 

• put/io ^argmin £ \\M d (u^)h - pW ||^. 

fe=i 
"g_ini.m" 20= 

function [g] = g_ini (p, d) 
( GCD common parameters 13a ) 

( Create Sylvester structure 18b) 

opt = struct (' disp' , 'off, 'maxiter', ) ; 
[bfph, info] = slra ( [ql; q2] , s, size (s .phi, 1 ) -1, opt); 
g = info . Rh ( : ) ; 
endo 
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'lsdivmult . m" 21a= 

function [h] = lsdivmult (p, d, g) 
{ GCD common parameters 13a ) 

st2 = cumsum ( [0;bf ell] +1) ; 

mm = [ ] ; 

for k=l : length (bfn) 

mm = [mm; multmat (g(st2(k):st2(k+l)-l), d) ] ; 
end 

h = mm \ bfp; 
endo 



6. Numerical examples 

6.1. Example of ill-conditioned polynomials ( mini and Boitol XlOKkueruk 120751) ) 

First we consider Example 4.2 from (IBini and Boitol 120101) (Test 5 from (ITeruiL |2013|)). In 
this test the following two polynomials are considered: 

10 10 

u(x) =Y[(x-x j ), v(x) = Y[(x-Xj), Xj = (-iy(j/2). 



We defi ne p^ = zt(x)/||ii|| 2 and p^> — v{x)/\\u\\2 (note that this differs from normaliza- 
tion in (IBini and Boitol 120101) '). and compare several methods according to Euclidean distance 

( Define test 1 polynomials 21b ) = 



P 

q 



1; 
1; 



for j=l:10 

x_j = (-l)-j * j/2; 

p = conv (p, [-x_j ; 1] ) ; 

q = conv(q, [-x_j+ 10" (-j) ; 1]); 
end 

sccoef = norm(p,2); 
p = p. /sccoef; 
q = q. /sccoef ;o 

Fragment referenced in 25. 



All the methods are started from the same initial approximation computed in Section 5.4. 
In Table 1 we present the results of comparison for d = {1, . . . , 10}, and in Table 2. "LRA" 
stands for using initial approximation in Sectio n 5.4 without optimization (refineme nt). "STLN" 
denoted the figures for the STLN method of iKaltofen etal] d2006l) reported in (JTeruil 120131) 
(scaled by the factor ||u|| 2 ). "UVGCD" denoted the figures for the UVGCD method of lZeng 
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Table 1. Optimal approximations of the methods 



d 


LRA 


STLN 


UVGCD 


IM^/IMg 


KER 


GN+LS 


1 


2.37- 1(T 3 


4.46 ■ 1(T 5 


3.54 • 10~ 16 


3.63- 10~ 16 


1.36 ■ 10~ 16 


8.85 ■ 10~ 16 


2 


1.1- 1(T 4 


5.99 ■ 10" 8 


2.28 -10" 14 


2.28- 10 -14 


2.61 ■ 10~ 17 


6.32 ■ 10" 14 


3 


1.09- 10 -4 


1.7- 10" 9 


1.55 • 10~ 12 


1.55- 10 -12 


1.91 ■ 10~ 16 


3.12 -10" 12 


4 


1.79- 10~ 5 


2.49 ■ 10 _1 ° 


8.53 -10" 11 


3.64 • 10 -6 


1.05- 10~ 14 


1.79- 10" 5 


5 


4.65 • 1(T 5 


4.55 ■ 1CT 9 


4.55 ■ 10~ 9 


4.55 • 10" 9 


1.75 ■ 10~ 5 


4.65 • 10~ 5 


6 


3.3 • 1(T 5 


1.85 ■ 10" 7 


1.86- 10~ 7 





2.75 • 10 -5 


3.3- 10^ 5 


7 


4.96 ■ 10 -4 


7.19 -lO -6 


7.19 ■ 10 -6 


7.19 • 10~ 6 


3.98 • 10" 4 


1.13- 10 -5 


8 


2.89 • 1(T 3 


1.76 ■ 10" 4 


1.76- 10~ 4 


1.76 • 10~ 4 


1.76 ■ 10~ 4 


3.27- 10" 4 


9 


2.2- 10 -2 


4.05 ■ 10" 3 


4.05 ■ 10~ 3 


4.05 • 10 -3 


4.05 • 10" 3 


4.29 ■ 10 -3 


10 


6.67- 10~ 2 


6.67 -10" 2 


6.67 ■ 10~ 2 


6.67 • 10~ 2 


6.67 -10~ 2 


6.67- 10~ 2 



Table 2. Number of iterations of the methods 



d 


1%/IMg 


KER 


GN+LS 


1 


4 


5 


5 


2 


4 


2 


13 


3 


4 


2 


7 


4 


100 


3 


1 


5 


5 


6 


1 


6 





16 


1 


7 


4 


100 


20 


8 


3 


5 


6 


9 


3 


4 


7 


10 


1 


1 


1 



(2008). "IM-r-" denotes the variable projection method w.r.t. h in the image representation (Sec- 



tion 4.1). "IMg" denotes the variable projection method w.r.t. g in the image representation 
(Section 4.2). "KER" stands for the variable projection method in the kernel representation (Sec- 
tion 5.2). "GN+LS" denotes the combination of the Gauss-Newton method and line search, used 
for the optimization step in (Bini and Boito, 2010). (We modified the code of Bini and Boito 
(120 101) in order to obtain the correct stopping criterion.) 



The constraints in the current version of the software in (ISLRl 120131) req uire fc ? ; < U in (17) 



(Thes e constraints are reasonable for mosaic Hankel low-rank approximation (lUsevich and Markovsky 



2013b .) These constraints exclude the cases d £ {6, . . . , 10} in the method of Section 4.1 and 
d € { 1 , . . . , 6} the method of Section 4.2. Hence, the results of the image representation methods 
(IMt and IMg) are combined in one column in Table 1 and Table 2. 

The results in Table 1 show that for all d except 4 (and also 6, which was not computed) 
the methods based on the image representation (Sections 4.1 and 4.2), match the results of 
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the UVGCD method (which are the overall best). Note that for all d (including 4) the methods 



based on image representation improve over the the Gauss-Newton method from (Bi ni and Boito , 



20101) . (However, it should be noted that for d — 4 the image representation method runs into 



maximum number of iterations.) 



The method based on the kernel representation (Section 5.2) fails to produce good results 



for d < 8. For d = 1, . . . , 3, it has ill-conditioned T matrix, and the result is computed with 



regularization of T in the package (SLRl 120131) . This means that the computed approximating 



polynomials may not have a common divisor. 



{ SLRA include directories 23a ) = 

addpath ' . ./. ./slra' ; 
addpath '../.. /slra/doc' ; 
addpath '../.. /slra/grassopt' o 

Fragment referenced in 25, 27. 



{ Prepare arrays 23b } = 

res = zeros (10,7); 

iters = zeros (10, 4); 

res(:,l) = (1:10)'; 

iters (:,1) = (1:10)'; 

res(:,3) = [5.17e-l; 6. 95e-4; 1. 97e-5;2. 89e-6;5.28e-5; . . . 

2.15e-3;8.34e-2;2.04e0;4.70el;7.73e2] / sccoef; 

opt.disp = 'iter'; 
opt.epsgrad = le-20; 


Fragment referenced in 25. 
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{ Compare new methods 24 ) = 

for d=l:5 

[ph, info] = gcd_nls ( {p, q} , [], d, opt); 

res(d,5) = norm([p;q] -cell2mat (ph) , 2 ) ; 

iters (d, 2) = info. iter; 
end 

for d=7:10 

[ph, info] = gcd_cof e ( {p, q} , [], d, opt); 

res(d,5) = norm([p;q] -cell2mat (ph) , 2 ) ; 

iters (d, 2) = info. iter; 
end 

for d=l:10 

[ph, info] = gcd_syl ( {p, q} , [], d, opt); 

res (d, 6) = norm([p;q] -cell2mat (ph) , 2 ) ; 

iters (d, 3) = info. iter; 
end 
o 

Fragment referenced in 25. 



24 



'test_exl .m" 25= 

{ SLRA include directories 23a ) 
addpath ' f astgcd' ; 
addpath ' uvgcd' ; 

( Define test 1 polynomials 21b ) 

( Prepare arrays 23b ) 

( Compare new methods 24 ) 

for d=l:10 

gini = g_ini({p,q}, d) ; 

hh = lsdivmult ( {p, q} , d, gini)'; 

glh = gini (1 : length (p) -d) ' ; 

g2h = gini (length (p) -d + ( 1 : length (q) -d) )' ; 

res(d,2) = norm([p;q] - [conv (hh, glh) ' ; conv (hh, g2h) ' ] , 2 ) ; 
[hh, glh, g2h, resn, iter] = c_f_newton_iter_mod (p' , q' , ... 

hh,glh,g2h, 1, 100, le-15) ; 
iters (d, 4) = iter; 
res(d,7) = norm([p;q] - [ conv (hh, glh) ' ; conv (hh, g2h) ' ] , 2 ) ; 

[hh, glh, g2h, uvres, uvcond] = uvGCDf ixedDegree (p, q, d) ; 
res(d,4) = norm([p;q] - [conv (hh, glh) ; conv (hh, g2h) ] , 2 ) ; 
end 

saveCres_terui5.txt', 'res', '-ascii'); 
saveCiter_terui5.txt', 'iters', ' -ascii' );o 



6.2. Comparison of the speed of the computations 

In this section we consider the following family of polynomials (which is similar to the one 
used in dBini and Boitoll20ld. § 4.6)): 



q^ k \z):=h{z)g^ k \z), q™ :=h(z)g™{z), 

h := z 4 + 10z 2 + z - 1, 

<? (M) 0) := {z Ml - l)(z" 2 - 2){z kn3 - 3), 

g {2 ' k) (z) := {z kil + l)(z M2 + b){z kn3 + 2), 

where t\ — 25, 1% = 15, £3 = 10. The test polynomials are defined to be 

where each e^h is a single realization of the Gaussian zero-mean random vector with the covari- 
ance matrix 1CP 4 /. 
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{ Define test 2 polynomials 26a } 



nl = 25*k; n2 = 15*k; n3 = 10*k; 

gl = conv (conv ( [-1; zeros (nl - 1, 1 ) ; 1], ... 

[-2; zeros (n2 - 1, 1) ; 1]), [-3; zeros (n3 - 
g2 = conv (conv ([ 1; zeros (nl - 1, 1); 1], ... 

[5; zeros (n2 - 1,1) ; 1]), [2; zeros (n3 - 1, 
p = conv(gl, hO); q = conv(g2, hO); 
s i gma = 1 e - 2 ; 

p = p + sigma * randn (size (p) ) ; 
q = q + sigma * randn (size (q) ) ; 
O 



l ) ; l ] ) 
; l ] ) ; 



Fragment referenced in 27. 



We consider the test polynomials for k = {1, . . . , 8}, thus the degrees of the polynomi- 
als range between 50 and 40 0. We compare two m ethods: "IMr" (Section 4.1) and "GN" (the 
Gauss-Newton method from JBini and Boitol uOlOl) . without line search). We limit the number 
of iterations to 1 in order to measure the time needed per iteration. All the methods are started 
from the same initial approximation. In Fig. 1, the time needed to perform 1 iteration of the local 
optimization procedure is plotted versus k. 
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Fig. 1 . Comparison of the computational complexity of the methods 

It can be seen from Fig. 1, the time growth is sublinear in k for the variable projection method, 
which confirms the results of Section 4. The time growth for the Gauss-Newton-based method is 
similar to quadratic. Note that the time needed for one iteration of the local optimization is of the 
same order as the total time reported in (IBini and Boitol 1201 01) (including initial approximation), 

and therefore cannot be neglected. 

"meas_f unc . m" 26b= 

function t = meas_f unc (f unc, pars) 
tic; eltm = toe; n = 0; 
while (eltm < 1) 
f unc (pars { : } ) ; 
n = n+ 1; 
eltm = toe; 
end 

t = eltm / n; 
endo 
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"test_speed.m" 27= 

{ SLRA include directories 23a ) 
addpath ' f astgcd' ; 

ntest = 8; 

times = zeros (ntest, 3) ; 
iters = zeros (ntest, 3) ; 
res = zeros (ntest, 3) ; 

hO = [-1; 1; 0; 10; 1] ; 
d = length (hO) - 1 ; 

times(:,l) = 1: ntest; 
iters(:,l) = l:ntest; 
res(:,l) = 1: ntest; 
for k=l:ntest 

k 

( Define test 2 polynomials 26a ) 

gini = g_ini({p,q}, d) ; 

h = lsdivmult ( {p, q} , d, gini)'; 

gl = gini ( 1 : length (p) -d) ' ; 

g2 = gini (length (p) -d + (1 : length (q) -d) )' ; 

hh = h; glh = gl; g2h = g2; 

opt . hini = hh' ; 
opt.epsgrad = le-20; 
opt.maxiter = 1; 

[ph, info] = gcd_nls ( {p, q} , [], d, opt); 
res(k,2) = norm([p;q] -cell2mat (ph) , 2 ) ; 
iters(k,2) = info. iter; 

[hh, glh, g2h, resn, iter] = c_f_newton_iter_mod (p' , q' , ... 

hh, glh, g2h, 0, opt . maxiter+1 , 0); 
iters (k, 3) = iter-1; 
res(k,3) = norm([p;q] - [ conv (hh, glh) ' ; conv (hh, g2h) ' ] , 2 ) ; 

hh = h; glh = gl; g2h = g2; 

times (k, 2) = meas_f unc (@gcd_nls, {{p,q}, [], d, opt } ) ; 

times (k, 3) = meas_func (@c_f_newton_iter_mod, {p',q', ... 

hh, glh, g2h, 0, opt .maxiter+1 , } ) ; 
end 

saveCres_speed.txt', 'res', '-ascii'); 
saveCiter_speed.txt', 'iters', '-ascii'); 
saveCtimes_speed.txt', 'times', ' -ascii' );o 
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6.3. Singularity ofT matrix for N > 2 in kernel representation 



In this subsection we show that on a particular example of N = 3 polynomials that in the ker- 
nel representation approach the corresponding T matrix in (20) is essentially singular. Consider 
three polynomials pW , p^ , p^ <E P2, and d = 1. Then the corresponding generalized Sylvester 



^1 (1) (P) 



subresultant matrix is 

Mi(p( 2 )) -Mi(pM) 

Mx(p( 3 )) -M^p' 1 ') 

By Proposition 12 and the results of (lUsevich and Markovskvl 120131) . we have that the corre- 
sponding r matrix in (20) has the form T(u) = G(u)G T (u), where u € Pf is the annihilating 
vector from (9), where 



G(u) = 






Mj^ 1 ') 



M x (u 



(1) 



It can be easily checked that the polynomial matrix T(u) has (symbolic) determinant 0. This is 
also confirmed by running the kernel representation optimization method. 



"test det.m" 28e 



syms uO ul vO vl wO wl; 



ul uO 0; ul uO 
vl vO 0; vl vO 
wl wO 0; wl wO 



mmu = [uO 

mmv = [vO 

mmw = [wO 

sylv = [-mmv mmu zeros (4,3), ■ 

det(sylv * (sylv')) 

endo 



ul] 
vl] 
wl] 



-mmw zeros (4, 3) mmu] 



7. Conclusions 

We have developed methods based on the variable projection principle, for optimization in 
the image and kernel representations. The advantages of the developed methods are that they 
have proven complexity results, and have available implementation that allows to use different 
optimization methods and different stopping criteria. 

The methods for optimization in the image representation have linear complexity in the de- 
grees of the polynomials if the degree of the common divisor d is small or if d is large. The 
methods provide accurate results matching the accuracy of other existing methods. The meth- 
ods based on the kernel representation have higher computational complexity and have issues of 
intrinsic singularity of the T matrix for N > 2 and ill-conditioning of T when the degree d is 
small. 

The advantage of image representation methods is that even if the optimization has not con- 
verged, the tentative cofactors and common divisors are available and provide a desired approx- 
imation (a certificate). This is in contrast with kernel representation methods (including the vari- 
able projection method from this paper and the STLN method), where the polynomials having a 
common divisor are obtained only upon convergence of an optimization method. 
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A. Proofs 

Lemma 14. 

_ J & d , ifW = C or (F = R and d is even), 

d ~ I & d U &d+i, if® = ^andd is odd. 

Proof. [Proof of Lemma 14] Evidently, ,^ d C %, &d+i C % and e ^ d . Let p = 
(pW, . . . ,pW) e % \ {0} and ft e F d , be a GCD of the pW, where d' > d. Then, ft can 

be factorized as 

d' 

h(z) =^(z-a k ), 
fe=i 
where a fe € CU{oo}.IfF = C then the p^ have a common divisor of degree d and p <G ^ d . If 
F = R, then every a^ has its conjugate counterpart. Hence, p have common divisors of degrees 
21 for any I : < 21 < d' . Therefore, p e J^ if d is even and p <G ^d+i if d is odd. □ 

Lemma 15. For any < d < n m i n , the sets &d an d % are closed subsets o/P n . 

Proof. [Proof of Lemma 15] Denote by S d C Pd the polynomials with 2-norm equal to 1, Then 
&d is the image of the infinitely smooth map P n _d x Sd —¥ P n defined as (g, ft) n- g • ft. Since 
the domain of the definition is closed and the map is continuous, & d is closed. 

By Lemma 14, % can be expressed as a union of at most two sets & dl and J^ 2 , and therefore 
it is also closed. □ 

Proof. [Proof of Lemma 9] If p^ — 0, then all polynomials p^ are zero by (uk). 

It is left to consider pW ^ 0. The case when all other polynomials are zero is trivial. Without 
loss of generality assume that there exists 2 < K < N such that p^ ^= for all 2 < k < N. 
Then we need to prove that deg gcd(p^ , . . . ,p^ K ^) > d. 

Since wWpW =^ 0, we have that u^ ^= for k > 2. Let us rewrite (uk) as 

pW_p^_ _ pW _ a 

u (l) - „(2) -•••- u(K) - b , 

where a/b is an irreducible fraction. Then we have that 

(k\ uik)a 
P (k) = — b ~, (A.2) 

and u^ should be divisible by b. Therefore, p^ have a common divisor a, with deg a > d, 
which can be established by counting dimensions in (A. 1). □ 



(A.l) 
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Proof. [Proof of Proposition 10] The "only if" part is trivial. Indeed, one can construct u, as in 
Lemma 9. 

In order to prove the "if", let us look at the shifted Sylvester matrices 



y!t\p (1 \,---,p (N) )--=y! 1 i \p {k \p {1 \---J k - 1 \p {k+l \---J N) ) 



In 



where Ik := Hfe — d+1, mi := ^((hi ■ ■ ■ , h-i)) an d m 2 '■= ^((h+i, ■ ■ ■ , In))- Note that S? d 
coincides with the definition in (37) For example, for three polynomials we have that 



(fc) 



^fVW 3) ) 

yP(p {1 \p {2) , P {3) ) 



M„ 2 _ d (p( 3 )) -M„ 3 _ d (p( 2 )) 

-M ni _ d (p^) M„ 3 _ d (pW) 

-M„ 2 _ d (p( 3 )) M n3 _ d (p&) 



Note that any matrix S^\ (p) can be extracted from the matrix =5*d(p) by selecting correspond- 
ing block rows. Therefore, if u € P n -d is a right annihilating vector u <G P n -d> it is a lso 
annihilating vector of S^} ' (p). Let us select k such that u^ k ' ^ 0. Then we have that 

^ (1 V fe) ,P (1) , • • • ,P (fc - 1) ,P (fe+1) , • . • ,P W ) col(u< fc \ « (1) , . . . , u**" 1 ), u (fe+1) , . . . , U (Ar) ) = 0, 
and by Lemma 9, the polynomials have deg gcd > d. □ 



Proof. [Proof of Proposition 12] For p G P„ and £ we have 

M m (p) T = ^+i,i> +n+ i (col(0^,p,0<)) . 
If we denote £& := n^ — d, then 

Jf iN+1 (col{0e N ,p^\0 iN )) 

o ••' 

^ 2+1 (col(0, 2 ,;p« 0, 2 )) 

M 1+1 (co\(0 ei ,p( 2 \0 ll )) ... J^ 1+ i(col(0^,pW,0^)) 



J^d(p) T = 



^jv + 1 0(£ N + l)xd 



^3 + 1 0(^3 + 1) Xd 



«2 + l 



/; 



&+1 



^ 1+ i(9 (2) ) 



n 
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Proof. [Proof of Lemma 13] If deggcd(p) = d, then b from (A.l) should be necessarily in Po 
(otherwise, by (A. 2), the polynomials would have a common divisor of degree higher than d). 
Therefore, u^ are factors of p^ by (A.2). □ 
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